Automatic Segmentation and Summarization of Spoken Lectures

نویسنده

  • Narine Kokhlikyan
چکیده

The ever-increasing number of online lectures has created an unprecedented opportunity for distance learning. Most online lectures are presented as unstructured text, audio and/or video files which make it di cult for students to locate relevant lectures and browse through them. In this thesis, we investigated several automatic lecture segmentation and summarization algorithms. Automatic lecture segmentation algorithms impose structure by inserting paragraph separators and section titles to the lecture to make them more readable. The Segmentation algorithms include K-Means, HMM, Hierarchical Segmentation. Unlike segmentation, Automatic lecture summarization algorithms compress lectures by selecting salient information based on importance of words/phrases and provide concise, coherent and readable summaries. The summarization algorithms were developed based on two supervised machine learning approaches: Support Vector Machine (SVM) and Conditional Random Fields (CRF). These approaches show comparable results with existing summarization methods. In this thesis we propose a novel Rhetorical Structure Index(RSI) to measure the structural importance of a feature. Experiments show that using RSI has significantly improved the segmentation accuracy as compared to the traditional contentbased feature weighting scheme such as TF/IDF. We also build summarization systems with structure-dependent models such that summarization generated for“Introduction” is di↵erent from that for “Main Lecture Content” and “Conclusion”. From all three segmentation algorithms HMM showed the best performance. However, similar to K-Means, HMM has definite number of segments which is not the case for Hierarchical Segmentation. Experiments show that using structure-dependent summarization outperforms the uniform summarization method by 15%.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A browsing system for classroom lecture speech

Developing technologies to summarize and retrieve huge quantities of spoken documents, recorded during classroom lectures, for the purpose of e-Learning or self-learning are important. In this paper, we describe an adaptation method of a language model to recognize keywords in given slides. Next, we propose a summarization method for spoken classroom lectures using prosodic features and linguis...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Intonational phrases for speech summarization

Extractive speech summarization approaches select relevant segments of spoken documents and concatenate them to generate a summary. The extraction unit chosen, whether a sentence, syntactic constituent, or other segment, has a significant impact on the overall quality and fluency of the summary. Even though sentences tend to be the choice of most the extractive speech summarizers, in this paper...

متن کامل

Automatic extraction of cue phrases for important sentences in lecture speech and automatic lecture speech summarization

We automatically extract the summaries of spoken class lectures. This paper presents a novel method for sentence extraction-based automatic speech summarization. We propose a technique that extracts “cue phrases for important sentences (CPs)” that often appear in important sentences. We formulate CP extraction as a labeling problem of word sequences and use Conditional Random Fields (CRF) [1] f...

متن کامل

Mandarin Chinese Broadcast News Retrieval and Summarization Using Probabilistic Generative Models

This paper presents our recent research work on applying probabilistic generative models to Mandarin Chinese broadcast news retrieval and summarization. Most models can be trained in either a supervised or unsupervised manner. In addition, both literal term matching and concept matching strategies have been intensively investigated. This paper also presents a prototype web-based Mandarin Chines...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014